Binary text classification using genetic programming with crossover-based oversampling for imbalanced datasets

نویسندگان

چکیده

It is well known that classifiers trained using imbalanced datasets usually have a bias toward the majority class. In this context, classification models can present high performance overall and for class, even when minority class significantly lower. This paper presents genetic programming (GP) model with crossover-based oversampling technique dataset binary text classification. The aim of study to apply an solve issue improve GP employed proposed technique. employs crossover operator generating new samples in dataset. By combination GP, was improved. shown outperforms all applications use original without resampling. Moreover, system surpassed approaches synthetic (SMOTE) random oversampling. Further comparison state-of-the-art on five terms F1-score shows superior approach.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generative Oversampling for Mining Imbalanced Datasets

One way to handle data mining problems where class prior probabilities and/or misclassification costs between classes are highly unequal is to resample the data until a new, desired class distribution in the training data is achieved. Many resampling techniques have been proposed in the past, and the relationship between resampling and cost-sensitive learning has been well studied. Surprisingly...

متن کامل

Oversampling Method for Imbalanced Classification

Classification problem for imbalanced datasets is pervasive in a lot of data mining domains. Imbalanced classification has been a hot topic in the academic community. From data level to algorithm level, a lot of solutions have been proposed to tackle the problems resulted from imbalanced datasets. SMOTE is the most popular data-level method and a lot of derivations based on it are developed to ...

متن کامل

Adaptive Oversampling for Imbalanced Data Classification

Data imbalance is known to significantly hinder the generalization performance of supervised learning algorithms. A common strategy to overcome this challenge is synthetic oversampling, where synthetic minority class examples are generated to balance the distribution between the examples of the majority and minority classes. We present a novel adaptive oversampling algorithm, VIRTUAL, that comb...

متن کامل

Using Self-organizing Maps for Binary Classification with Highly Imbalanced Datasets

Highly imbalanced datasets occur in domains like fraud detection, fraud prediction, and clinical diagnosis of rare diseases, among others. These datasets are characterized by the existence of a prevalent class (e.g. legitimate sellers) while the other is relatively rare (e.g. fraudsters). Although small in proportion, the observations belonging to the minority class can be of a crucial importan...

متن کامل

Classification in Imbalanced Datasets

In this thesis we study the classification task in the presence of class imbalanced data. This task arises in many applications when we are interested in the under-represented (minority) classes. Examples of such applications are related to fraud detection, medical diagnosis and monitoring, text categorization, risk management, information retrieval and filtering. Although there exist many stan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Turkish Journal of Electrical Engineering and Computer Sciences

سال: 2023

ISSN: ['1300-0632', '1303-6203']

DOI: https://doi.org/10.55730/1300-0632.3978